8 research outputs found

    Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

    Get PDF
    Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. WithWordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments inWikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities

    Дидактичний потенціал віртуального інформаційного освітнього середовища як засобу навчання студентів географії

    Get PDF
    The article clarifies the concept of “virtual information educational environment” (VIEE) and examines the researchers’ views on its meaning exposed in the scientific literature. The article determines the didactic potential of the virtual information educational environment for the geography students training based on the analysis of the authors’ experience of blended learning by means of the Google Classroom. It also specifies the features (immersion, interactivity, and dynamism, sense of presence, continuity, and causality). The authors highlighted the advantages of virtual information educational environment implementation, such as: increase of the efficiency of the educational process by intensifying the process of cognition and interpersonal interactive communication; continuous access to multimedia content both in Google Classroom and beyond; saving student time due to the absence of necessity to work out the training material “manually”; availability of virtual pages of the virtual class; individualization of the educational process; formation of informational culture of the geography students; and more productive learning of the educational material at the expense of IT educational facilities. Among the disadvantages the article mentions low level of computerization, insignificant quantity and low quality of software products, underestimation of the role of VIЕЕ in the professional training of geography students, and the lack of economic stimuli, etc.У статті роз'яснено поняття «віртуальне інформаційне освітнє середовище» (VIEE) та розглядається погляд дослідників на його значення, викладені в науковій літературі. У статті визначено дидактичний потенціал віртуального інформаційного освітнього середовища для навчання студентів з географії на основі аналізу досвіду авторів змішаного навчання за допомогою Google Classroom. Він також визначає особливості (зануреність, інтерактивність та динамізм, відчуття присутності, наступність та причинність). Автори виділили переваги впровадження віртуального інформаційного освітнього середовища, такі як: підвищення ефективності навчального процесу за рахунок інтенсифікації процесу пізнання та міжособистісного інтерактивного спілкування; постійний доступ до мультимедійного контенту як у Google Classroom, так і за його межами; економія часу студента через відсутність необхідності опрацьовувати навчальний матеріал «вручну»; наявність віртуальних сторінок віртуального класу; індивідуалізація навчального процесу; формування інформаційної культури студентів з географії; і більш продуктивне засвоєння навчального матеріалу за рахунок засобів інформатизації. Серед недоліків у статті згадується низький рівень комп’ютеризації, незначна кількість та низька якість програмних продуктів, недооцінка ролі ВІЕЕ у професійній підготовці студентів з географії, відсутність економічних стимулів тощо

    Number of Wikipedia articles that have a certain number of language versions in particular topics.

    No full text
    For each topic in Wikipedia the number of articles that were translated to a given number of languages (logarithmic scale on vertical axis)

    Main Influencing Factors of Quality Determination of Collaborative Open Data Pages

    No full text
    Collaborative knowledge bases allow anyone to create and edit information online. One example of a resource with collaborative content is Wikipedia. Despite the fact that this free encyclopedia is one of the most popular sources of information in the world, it is often criticized for the poor quality of its content. Articles in Wikipedia in different languages on the same topic, can be created and edited independently of each other. Some of these language versions can provide very different but valuable information on each topic. Measuring the quality of articles using metrics is intended to make open data pages such as Wikipedia more reliable and trustworthy. A major challenge is that the ‘gold standard’ in determining the quality of an open data page is unknown. Therefore, we investigated which factors influence the potentials of quality determination of collaborative open data pages and their sources. Our model is based on empirical data derived from the experience of international experts on knowledge management and data quality. It has been developed by using semi-structured interviews and a qualitative content analysis based on Grounded Theory (GT). Important influencing factors are: Better outcomes, Better decision making, Limitations, More efficient workflows for article creation and review, Process efficiency, Quality improvement, Reliable and trustworthy utilization of data

    Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics

    No full text
    On Wikipedia, articles about various topics can be created and edited independently in each language version. Therefore, the quality of information about the same topic depends on the language. Any interested user can improve an article and that improvement may depend on the popularity of the article. The goal of this study is to show what topics are best represented in different language versions of Wikipedia using results of quality assessment for over 39 million articles in 55 languages. In this paper, we also analyze how popular selected topics are among readers and authors in various languages. We used two approaches to assign articles to various topics. First, we selected 27 main multilingual categories and analyzed all their connections with sub-categories based on information extracted from over 10 million categories in 55 language versions. To classify the articles to one of the 27 main categories, we took into account over 400 million links from articles to over 10 million categories and over 26 million links between categories. In the second approach, we used data from DBpedia and Wikidata. We also showed how the results of the study can be used to build local and global rankings of the Wikipedia content

    Cryptocurrencies Perception Using Wikipedia and Google Trends

    No full text
    In this research we presented different approaches to investigate the possible relationships between the largest crowd-based knowledge source and the market potential of particular cryptocurrencies. Identification of such relations is crucial because their existence may be used to create a broad spectrum of analyses and reports about cryptocurrency projects and to obtain a comprehensive outlook of the blockchain domain. The activities on the blockchain reach different levels of anonymity which renders them hard objects of studies. In particular, the standard tools used to characterize social trends and variables that describe cryptocurrencies’ situations are unsuitable to be used in the environment that extensively employs cryptographic techniques to hide real users. The employment of Wikipedia to trace crypto assets value need examination because the portal allows gathering of different opinions—content of the articles is edited by a group of people. Consequently, the information can be more attractive and useful for the readers than in case of non-collaborative sources of information. Wikipedia Articles often appears in the premium position of such search engines as Google, Bing, Yahoo and others. One may expect different demand on information about particular cryptocurrency depending on the different events (e.g., sharp fluctuations of price). Wikipedia offers only information about cryptocurrencies that are important from the point of view of language community of the users in Wikipedia. This “filter” helps to better identify those cryptocurrencies that have a significant influence on the regional markets. The models encompass linkages between different variables and properties. In one model cryptocurrency projects are ranked with the means of articles sentiment and quality. In another model, Wikipedia visits are linked to cryptocurrencies’ popularity. Additionally, the interactions between information demand in different Wikipedia language versions are elaborated. They are used to assess the geographical esteem of certain crypto coins. The information about the legal status of cryptocurrency technologies in different states that are offered by Wikipedia is used in another proposed model. It allows assessment of the adoption of cryptocurrencies in a given legislature. Finally, a model is developed that joins Wikipedia articles editions and deletions with the social sentiment towards particular cryptocurrency projects. The mentioned analytical purposes that permit assessment of the popularity of blockchain technologies in different local communities are not the only results of the paper. The models can show which country has the biggest demand on particular cryptocurrencies, such as Bitcoin, Ethereum, Ripple, Bitcoin Cash, Monero, Litecoin, Dogecoin and others

    Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles

    No full text
    Despite the fact that Wikipedia is often criticized for its poor quality, it continues to be one of the most popular knowledge bases in the world. Articles in this free encyclopedia on various topics can be created and edited in about 300 different language versions independently. Our research has showed that in language sensitive topics, the quality of information can be relatively better in the relevant language versions. However, in most cases, it is difficult for the Wikipedia readers to determine the language affiliation of the described subject. Additionally, each language edition of Wikipedia can have own rules in the manual assessing of the content’s quality. There are also differences in grading schemes between language versions: some use a 6–8 grade system to assess articles, and some are limited to 2–3. This makes automatic quality comparison of articles between various languages a challenging task, particularly if we take into account a large number of unassessed articles; some of the Wikipedia language editions have over 99% of articles without a quality grade. The paper presents the results of a relative quality and popularity assessment of over 28 million articles in 44 selected language versions. Comparative analysis of the quality and the popularity of articles in popular topics was also conducted. Additionally, the correlation between quality and popularity of Wikipedia articles of selected topics in various languages was investigated. The proposed method allows us to find articles with information of better quality that can be used to automatically enrich other language editions of Wikipedia

    Artifcial intelligence-friend or foe in fake news campaigns

    No full text
    In this paper the impact of large language models (LLM) on the fake news phenomenon is analysed. On the one hand decent text‐generaotin capabilietis can be misused for mass fake news production. On the other, LLMs trained on huge volumes of text have already accumulated information on many facts thus one may assume they could be used for fact‐checking. Experiments were designed and conducted to verify how much LLM responses are aligned with actual fact‐checking verdicts. The research methodology consists of an experimental dataset preparation and a protocol for interacting with ChatGPT, currently the most sophisticate
    corecore